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Abstract 

Linear dimensionality reduction techniques are powerful tools for image analysis as they allow 
the identification of important features in a data set. In particular, nonnegative matrix factorization 
(NMF) has become very popular as it is able to extract sparse, localized and easily interpretable 
features by imposing an additive combination of nonnegative basis elements. Nonnegative matrix 
underapproximation (NMU) is a closely related technique that has the advantage to identify features 
sequentially. In this paper, we propose a variant of NMU that is particularly well suited for image 
analysis as it incorporates the spatial information, that is, it takes into account the fact that 
neighboring pixels are more likely to be contained in the same features, and favors the extraction of 
localized features by looking for sparse basis elements. We show that our new approach competes 
favorably with comparable state-of-the-art techniques on synthetic, facial and hyperspectral image 
data sets. 

Keywords, nonnegative matrix factorization, underapproximation, sparsity, hyperspectral imag¬ 
ing, dimensionality reduction, spatial information. 

1 Introduction 

Linear dimensionality reduction (LDR) techniques are powerful tools for the representation and anal¬ 
ysis of high dimensional data. The most well-known and widely used LDR is principal component 
analysis (PCA) [H]. When dealing with nonnegative data, it is sometimes crucial to take into account 
the nonnegativity in the decomposition to be able to interpret the LDR meaningfully. For this reason, 
nonnegative matrix factorization (NMF) was introduced and has been shown to be very useful in sev¬ 
eral applications such as document classification, air emission control and microarray data analysis; 
see, e.g., 13 and the references therein. Given a nonnegative input data matrix M G and a 

factorization rank r, NMF looks for two matrices U G and V G such that M UV. Hence 

each row M(i,:) of the input matrix M is approximated via a linear combination of the rows of V: 
for 1 < i < n, 

r 

k=i 
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In other words, the rows of V form an approximate basis for the rows of M, and the weights needed 
to reconstruct each row of M are given by the entries of the corresponding row of U. The advantage 
of NMF over PCA (that does not impose nonnegativity contraints on the factors U and V) is that 
the basis elements V can be interpreted in the same way as the data (e.g., as vectors of pixel inten¬ 
sities; see Section for some illustrations) while the nonnegativity of the weights in U make them 
easily interpretable as activation coefficients. In this paper, we focus on imaging applications and, in 
particular, on blind hyperspectral unmixing which we describe in the next section. 

1.1 Nonnegative Matrix Factorization for Hyperspectral Images 

A hyperspectral image (HSI) is a three dimensional data cube providing the electromagnetic reflectance 
of a scene at varying wavelengths measured by hyperspectral remote sensors. Reflectance varies with 
wavelength for most materials because energy at certain wavelengths is scattered or absorbed to dif¬ 
ferent degrees, this is referred to as the spectral signature of a material; see, e.g., m- Some materials 
will reflect the light at certain wavelengths, others will absorb it at the same wavelengths. This 
property of hyperspectral images is used to uniquely identify the constitutive materials in a scene, 
referred to as endmembers, and classify pixels according to the endmembers they contain. A hyper¬ 
spectral data cube can be represented by a two dimensional pixel-by-wavelength matrix M G 
The columns G of M (1 < j < m) are original images that have been converted into 

n-dimensional column vectors (stacking the columns of the image matrix into a single vector), while 
the rows M(i,:) G of M (1 < i < n) are the spectral signatures of the pixels (see Figure [^. 
Each entry Mij represents the reflectance of the i-th pixel at the j-th wavelength. Under the linear 
mixing model, the spectral signature of each pixel results from the additive linear combination of the 
nonnegative spectral signatures of the endmembers it contains. In that case, NMF allows to model hy- 
perspectal images because of the nonnegativity of the spectral signatures and the abundances: Given 
a hyperspectral data cube represented by a two dimensional matrix M G NMF approximates 

it with the product of two factor matrices U G and V G such that the spectral signature 

of each pixel (a row of matrix M) is approximated by the additive linear combination of the spec¬ 
tral signatures of the endmembers (rows of matrix U), weighted by coefficients Uik representing the 
abundance of the fc-th endmember in the i-th pixel. For all z, we have: 

r 

( 1 . 1 ) 

k=l 

where r is the number of endmembers in the image. The matrix U is called the abundance matrix 
while the matrix V is the endmember matrix. Figure illustrates this decomposition on the Urban 
hyperspectral data cube. 

Unfortunately, as opposed to PCA, NMF is a difficult problem (NP-hard) [22]. Moreover, the 
decomposition is in general non-unique and has to be recomputed from scratch when the factorization 
rank r is modifled. For these reasons, a variant of NMF, referred to as nonnegative matrix under ap¬ 
proximation, was recently proposed that allows to compute factors sequentially; it is presented in the 
next section. 

1.2 Nonnegative Matrix Under approximation 

Nonnegative matrix underapproximation (NMU) |H] was introduced in order to solve NMF sequentially, 
that is, to compute one rank-one factor U(:, fc)U(A;,:) at a time: first compute [/(:, 1)U(1,:), then 
[/(:, 2)U(2,:), etc. In other words, NMU tries to identify sparse and localized features sequentially. 
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Figure 1: Decomposition of the Urban hyperspectral image from http://www.agc.army.mil/, constituted 
mainly of six endmembers (r = 6): road, grass, dirt, two kind of roof tops and trees. Each row of the matrix 
V is the spectral signature of an endmember, while each row of the matrix U is the abundance map of the 
corresponding endmember, that is, it contains the abundance of all pixels for that endmember. 


In order to keep the nonnegativity in the sequential decomposition, it is natural to use the following 
upper bound constraint for each rank-one factor of the decomposition: for all 1 < fc < r, 

e(:, k)V{k, U{:,p)Vip ,:). 

p<k 

Hence, given a data matrix M G NMU solves, at the first step, the following optimization 

problem: 

min IIM — uv^W such that uv^ < M. 

referred to as rank-one NMU. Then, the nonnegative residual matrix R — M — > 0 is computed, 

and the same procedure can be applied on the residual matrix R. After r steps, NMU provides a 
rank-r NMF of the data matrix M. Compared to NMF, NMU has the following advantages: 

1. As PC A, the solution is unique (under some mild assumptions) [TOj . 

2. As PCA, the solution can be computed sequentially, and hence the factorization rank does not 
need to be chosen a priori, and 

3. As NMF, it leads to a separation by parts. Moreover the additional underapproximation con¬ 
straints enhance this property leading to better decompositions into parts [8] (see also Section]^ 
for some numerical experiments). 

NMU was used successfully for example in hyperspectral m and medical imaging ffsum, and for 
document classification [3j. 
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1.3 Outline and Contribution of the Paper 

Modifications of the original NMU algorithm were made by adding prior information into the model, 
as it is also often done with NMF; see, e.g., |3] and the references therein. More precisely, two variants 
of NMU have been proposed: 

1. One adding sparsity constraints on the abundance matrix, dubbed sparse NMU HU, and 

2. One adding spatial information about pixels, dubbed spatial NMU [T2] . 

In this paper, we include both sparsity constraints and spatial information in NMU. This allows us to 
extract localized features in images more effectively. We present our algorithm in Section and show 
in Section that it competes favorably with comparable state-of-the-art techniques (NMF, sparse 
NMF and several variants of NMU) on synthetic, hyperspectral and facial image data sets. 


2 Algorithm for NMU with Priors 

In this section, we describe our proposed technique that will incorporate both spatial and sparsity 
priors into the NMU model. This will allow us to extract more localized and more coherent features 
in images. 


2.1 Reformulation of NMU and Lagrangian-Based Algorithm 

First, let us briefly describe the original NMU algorithm from [8]. The original rank-one NMU problem 
is the following 

min ||Tf — such that uv^ < M. (2.1) 

As described in the introduction, solving this problem allows to compute NMF sequentially, that 
is, one rank-one factor at a time while preserving nonnegativity. In |8], approximate solutions for 
rank-one NMU are obtained using the Lagrangian dual 

max min L (u.v. A) = moix min \\M — + 2^^ (uv^ — M) .. Aij. (2.2) 

A>0n>0,i;>0 ^ ^ A>0 n>0,i;>0 " ^^ ^ 


where A G is the matrix containing the Lagrangian multipliers of the underapproximation 

constraints. The authors prove that for a fixed A, the problem mmu>o^v>o L (i/, A), called Lagrangian 

, is equivalent to min^i>o,i;>o || (^ — A) — ||^ which is equivalent, up to a scaling 

of the variables, to 


Relaxation of (2.1 


max u^{M — A)v such that H^^Ho ^ ^ 1* (2-3) 

n>0,'f;>0 


In fact, one can show that any optimal solution of (2.2) is, up to a scaling factor, an optimal solution 
of (2.3), and vice versa. It has to be noted that, except for the trivial stationary point (that is, 
u = 0 and v = 0^ which is optimal only for M — A < 0), any stationary point of (2.3) will satisfy 
ll'^lb = Ibib = 1 (this can be shown by contradiction). Therefore, without loss of generality, one can 
assume \\u \\2 = 1 and ||?;||2 = 1. 

Based on these observations, the original NMU algorithm optimizes alternatively over the variables 
u and 'u, and updates the Lagrangian multipliers accordingly. The advantage of this scheme is that it 
is relatively simple as the optimal solution of u given v can be written in closed form (and vice versa). 
The original NMU algorithm iterates between the following steps: 


4 






o u ^ max (0, (M — A)'?;), u ^ i //||^/||2 , 
o V ^ max (O, (M — v ^ '?^/||'?^||2 , 

o A ^ max (O, A + ji^auv^ — M)) where a — u^{M — K)v and /i is a parameter. 

Note that given u and a — u^{M — A)v is the optimal rescaling of the rank-one factor uv^ to 
approximate M — A. The parameter jji can for example be equal to j where t is the iteration index 
to guarantee convergence [8]. Note also that the original NMU algorithm shares similarities with the 
power method used to compute the best rank-one unconstrained approximation; see the discussion 
in [To] . 

2.2 Spatial Information 

The first information that we incorporate in NMU is that neighboring pixels are more likely to be 
contained in the same features. The addition of spatial information into NMU improves the decom¬ 
position of images by generating spatially coherent features (e.g., it is in general not desirable that 
features contain isolated pixels). In this paper, we use the anisotropic total variation (TV) regular¬ 
ization; see, e.g., m- We define the neighborhood of a pixel as its four adjacent pixels (above, below, 
left and right). To evaluate the spatial coherence, we use the following function 

n 

Y, E k-«il = 2||iV«||i, (2.4) 

^=1 jeN{i) 


where M {i) is the set of neighboring pixels of pixel i, and N G is a neighbor matrix where each 

column corresponds to a pixel and each row indicates a pair (i, j) of neighboring pixels: 

N{k^i) — 1 and V(A:, j) = —1, (2.5) 

with 1 < i < j < m and j G A/"(i), and K is the number of neighboring pairs {K < 4n, because each 
pixel has at most 4 neighbors). The term ||V?i||i therefore accounts for the distances between each 
pixel and its neighbors. Note that the norm is used here because it is able to preserve the edges in 
the images, as opposed to the ^2 norm which would smooth them out; see, e.g., [23 ESI. The same 
penalty was used in m to incorporate spatial information into NMU. 


2.3 Sparsity Information 

The second information incorporated in the proposed algorithm is that each feature should contain 
a relatively small number of pixels. In hyperspectral imaging, this translates into the fact that each 
pixel usually contains only a few endmembers: the row vectors of the abundance matrix U should 
have only few non-zero elements. In HU, the authors demonstrated that incorporating this sparsity 
prior into NMU leads to better decompositions. They added a regularization term to the objective 
function in (2.1) based on the ^i-norm heuristic approach, in order to minimize the non-zero entries 
of ?i: III/111 where \\u \\2 = 1. This is the most standard approach to incorporate sparsity and was also 


used to design sparse NMF algorithms m- 
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2.4 Algorithm for NMU with Spatial and Sparsity Constraints 

In order to inject information in the factorization process, we add the regularization terms for the 


sparsity and spatial information in the NMU formulation (2.3): 

/ 


max 


{M — A)v — (p ||?i||^ — /i II Ai 
^- 

\ =f{u,v) 




/ 


( 2 . 6 ) 


The objective function is composed of three terms: the first one relates to the classical least squares 
residual, the second enhances sparsity of the abundance vector u while the third improves its spatial 
coherence. The regularization parameters fi and (f are used to balance the influence of the three terms. 
We will refer to ( |2.6[ ) as Prior NMU (PNMU). Note that we relax the constraint 111^112 = 1 from NMU 
to ||i^||2 < 1 to have a convex optimization problem in u; see below for more details. 

Algorithm formally describes the alternating scheme to solve the NMU problem with sparsity 
and spatial constraints ( |2.6[ ). 

As in the original NMU model, the optimal solution for v given u can still be written in closed form. 
However, because of the spatial information (the term ||Ai^||i), there is no closed form for the optimal 
solution of u given although the problem is convex in u. In order to find an approximate solution 
to that subproblem, we combine iterative reweighted least squares with a standard projected gradient 
scheme from m, as proposed in d; see below for more details. Finally, a simple block-coordinate 
descent scheme is used to find good solutions to problem (2.6). This is achieved by applying the 


following alternating scheme that optimizes one block of variables while keeping the other fixed: 


1. V 


max 


2 . u 


(^{M — A)^ ,'L’^'L ’/||'?;||2 (lines 


V {u — jVfw ('?i)) (from line 12 to line[T^ 


21 


and 


22 ) 


where L is the Lipschitz constant of V fw {u)^ where is a smooth approximation of /; see the 


paragraph below and Equation (2.8). 


The projection is defined as 

[max (0, s) otherwise. 

3. A ^ max (O, A + /i (^auv^ — M)) with a = u^Mv (from line 
Variables (u^v^A) are initialized with a solution of the original NMU algorithm from [TT] (line[^. 


24 


to line 


30) 


Smooth approximation of / The variables v and A are updated as in the original NMU algorithm, 
whilst the update of u is modified in order to take into account the penalty terms. The update of 
u involves the non-differentiable £i-norm term ||Vi/||i in the objective function /. Note that, since 
^ 0, ||ix||i = hence it is differentiable on the feasible set. The authors in [12] suggested to 

use iteratively re-weighted least squares (IRWLS) to approximate the £i-norm. At the t-th iteration, 
the term ||Vi/||]^ is replaced with 


IIVi/||]^ 


—^—1 




(2.7) 
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Algorithm 1 Prior NMU: incorporating spatial information and sparsity into NMU 


Require: M G M!|; 
Ensure: ([/, V) G 


\ r G N+, 0<(/^'<l,0</i'<l, small parameter e > 0, maxiter, iter. 

X s.t. UV < M with U containing sparseness and locality information. 


5 

6 

7 
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12 

13 

14 

15 

16 

17 

18 

19 

20 
21 
22 

23 

24 

25 

26 

27 

28 

29 

30 

31 

32 

33 

34 

35 


Generate the matrix N according to (2.5); 

for k = 1 : r do 

[x^y^A] — rank-one underapproximation(M); % Initialization of {x^y) with an approximate 
solution to NMU ( |2.1[ ) 


Wi — {\Nx\ - -he) ] W — diag(rc); % Initialization of IRWLS weights 
^ = (p' II (M — A) y\\^ % Setting the sparsity parameter p 
X = max (0, {x — p)) ; x = 

z = rand(m, 1); % Estimate of the eigenvector of B associated with the largest eigenvalue 
for t — 1 \ maxiter do 
A = A; 

% Update of x 
B = {WN)^ {WN ); 

for / = 1 : iter 2 ; = Bz] z = %Power method 

IGII2 

for / = 1 : iter do 

% Setting of the spatial parameter fi 


y 


_ / ll-^yllo 

\\Bx\\^ 


L — max (e, /i % Approximated Lipschitz constant 

V/(x) = Ay- yBx - cp; 

X {Lx -h V/ {x )); 
end for 
% Update of y 
y ^ max (O, AA-x) ; 
if ||y ||2 ^ 0 then y ^ 

%Update of A and save {x^y) 
if X ^ 0 and y A ^ then 
<T = x^Ayyuk ^ X] Vk-^ cry, 

else 


max 


(0,A - ^ (M-nfcnJ)) 


A ^ A. 

^ 2 5 


X 


Uk 


1K112’ y K112’ 

end if 

% Update of the weights 

Wi = {\Nx\ - -h ; W = diag(ic); 

end for 

M = max ( 0 , M — uj^v^) ; 

end for 


Vk 


where = diag diag(x) takes a vector x as an input and returns a diagonal matrix with 

/ X _ 1 

the entries of x on the diagonal, and ^ -h 6:) ^ (lines 12 

approximate the £i-norm of a vector 2 ; G as a weighted ^ 2 -norm: 


and 32). The idea is to 


Ulli 


n 

i=l 


with W4 — 


1 


\zi\+e 
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where e is a small constant (we used 10“^). In our case, the weights are estimated using the value 
of u at the previous iteration. We denote the modified objective function that approximates / 
at the current iterate. Hence the non-differentiable term ||A/^?i||i is approximated with a convex and 
quadratic differentiable term u^Bu^ and we can use a standard gradient descent scheme from smooth 
convex optimization [20] (line HH); the gradient being 

V/^(; (u) = (M — A)v — ipe — fi {Bu) , (2.8) 


where e is the vector of all ones of appropriate dimension. The Lipschitz constant L oiV is equal 
to the largest singular value of B. Since the matrix B is rather large (but sparse), it is computational 
costly to compute exactly its largest eigenvalue, hence we use several steps of the power method (line 


13) to estimate the Lipschitz constant L. 


Choice of the penalty parameters (p and ji It is in general difficult to choose a priori ‘optimal’ 
values for the penalty parameters p (sparsity parameter) and p (local information). In fact, it is 
difficult to know the sparsity and spatial coherence of the unknown localized features. Moreover, it is 
difficult to relate them to the parameters p and p. Note that this choice also depends on the scaling 


of the input matrix: if M is multiplied by a constant, the first term in (2.6) is increased by the same 
constant. 

In this paper, we use scaling-independent parameters: we replace p and p with (lines [T^ and [^: 


p — p 


MM-A)v\ 




and p = p'\\{M — A)v\ 


oo ’ 


for some p' G [0,1] and p' G [0,1]. These choices were proposed respectively in [T2| and m- They 
have the advantage to give a range of possible values for the penalty parameters: 

o (/p' = 1 is the highest possible sparsity level: it would imply that the largest value of x is set to 
zero by the thresholding operator in the projection V. 

o p' — 1 would give more importance to the spatial coherence and will generate extremely smooth 
features (see next section for some numerical experiments). 


Computational Cost The computational cost of Algorithm is the same as the other NMU vari¬ 
ants: it requires 0{mn) operations to compute each rank-one factor, for a total of 0{mnr) operations. 
Most of the cost resides in the matrix-vector products {Av and A^u for the updates of u and v) and 
the update of the Lagrangian multipliers A. Hence, the cost is linear in m, n and r, as for most NMF 
algorithms (in particular the one described in the next section). 


Convergence The convergence analysis of Algorithm is nontrivial, because it combines several 
strategies that are themselves difficult to analyze. In fact, our algorithm is based on a Langragian 
relaxation (whose subproblems are solved using a two-block coordinate descent method) where the 
Lagrangian variables A are updated to guarantee the convergence of the scheme [2] (see also the 
discussion in [8|). This is combined with a reweighted least squares approach to approximate the non- 
differentiable norm ||A^i/||i, which is also guaranteed to converge [5]. In practice, we have observed 
that Algorithm converges usually within 500 iterations (but this depends in particular on the size 
of the data set); see Section 3.1 for some numerical experiments. 






3 Experimental Results 


In this section, we conduct several experiments to show the effectiveness of PNMU. 

In the first part, we use synthetic data sets for which we know the ground truth hence allowing us 
to quantify very precisely the quality of the solutions obtained by the different algorithms. 

In the second part, we validate the good performance of PNMU on real data sets, and compare 
the algorithms on two widely used data sets, namely, the Cuprite hyperspectral image and the CBCL 
face data set. 

The following algorithms will be compared: 

o Nonnegative matrix factorization (NMF). We use the accelerated HALS algorithm (A- 
HALS) [9] (with parameters ol — 0.5 and 6: = 0.1, as suggested by the authors). 

o Sparse NMF (SNMF). It is a sparse version of A-HALS [6] that adds an ^i-norm sparsity¬ 
enhancing term for the abundance matrix U. We run SNMF with target sparsity for the matrix 
U given by the sparsity of the abundance matrix U obtained by PNMU. Hence we can compare 
PNMU with SNMF meaningfully as they will have comparable sparsity levels, as it was done 
in [8] to compare NMU with SNMF. 

o Nonnegative matrix underapproximation (NMU). This is PNMU with cp' = 0 and fi' = 0. 

o Nonnegative matrix underapproximation with local information (LNMU). This is 
PNMU with ip' = 0. 

o Sparse nonnegative matrix underapproximation (SNMU). This is PNMU with p' = 0. 

o Nonnegative matrix underapproximation with prior information (PNMU). This is 
Algorithm 

All the numerical results are obtained by implementing the algorithms in Matlab 7.8 codes and 
running them on a machine equipped with an Intel® Xeron® CPU E5420 Dual Core 250 GHz, RAM 
8.00 GB. The code is available online from https://sites.google.com/site/nicolasgillis/code, 
We use maxiter = 500, iter = 10 for all experiments in the paper. 

3.1 Synthetic data sets 

Let us construct synthetic hyperspectral images. We use r — A materials perfectly separated. Materials 
are adjacent rectangles of size 10 x (A:+l) pixels with A; = 1, 2, 3,4, so that the image at each wavelength 
contains 10 x 14 pixels (m = 140); see the top row of Figure]^ Mathematically, for 1 < i < 140 and 
1 < A; < r, 

1 if5(A;-l)(A; + 2) + l<i<5(A:-l)(A; + 2) + 10(A; + l), 

[0 otherwise. 

We use n — 20 wavelengths. The spectral signatures are sinusoids with different phases plus a constant 
to make them nonnegative: 

U(j,A:) = 1.1+ sin(x(j)+ (/)(A:)) > 0, 

where x{j) = j ^ with j = 1, 2,..., n, and (^{k) = (A: — 1)^ with A: = 1, 2,..., r. We also permute the 
rows of V so that adjacent materials have less similar spectral signatures (we used the permutation 
[13 2 4]). Figure shows the spectral signatures. 

For the noise, we combine Gaussian and salt-and-pepper noise. Let us denote M = 1.1 the average 
of the entries of UV (which is the noiseless synthetic HSI): 
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Figure 2: Synhetic spectral signatures. 


o Gaussian. We use 

^ ^MA/’(0,1), 1 < i < n, 1 < j < m, 

where g is the intensity of the noise. We use the function randn(n,m) of Matlab. 

o Salt-Pepper. For the salt-and-pepper noise, we use a sparse matrix P of density p whose non-zero 
entries are distributed following MA/’(0,1). We use the function sprandn(n,m,p) of Matlab. 

Finally, we generate the noisy hyperspectral image M = UV -\-G-\- P] see Figure for an example. 
Given a solution ([/, V) generated by an algorithm using matrix M, we will compare the quality of 



Figure 3: Example of a synthetic HSI with g = 30% and p — 15%. Each image corresponds to a 
wavelength. 


the r extracted materials as follows. First, we normalize the columns of U so that the maximal entries 


are equal to one, that is, t7(:, k) 
to minimize 




maxi U(i^k) 

1 

match(U,U) = — \\U(:,k) - U{:,k)\\‘^ 

rp / ^ 


mr 


for all k. Then, we permute the columns of U in order 

2 ^ [0,1]. (3.1) 


k=i 


The match will be equal to zero if U coincides with U (up to a permutation of the columns), and 
equal to 1 when [/(i. A:) = 0 k) = 1 for all i, k (perfect mismatch). 

Note that for our particular f7, the match value of the zero matrix is equal to match(?7,0) = 25% 
(which is the density of U) hence we should not expect values higher than that in our numerical 
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experiments. As we will see, match values higher than 2% correspond to a relatively poor recovery of 
the matrix [/; see Figure 


3.1.1 Choice of (j)' and ji' 

First, let us observe the influence of 0' and fi' on PNMU and choose reasonable values for these 
parameters for these synthetic data sets. We fix the noise level to ^ = 0.2 and p — 0.05, and vary 0' 
and fi'. Figure shows the average match of PNMU over 20 randomly generated synthetic data sets 
for different values of 0' and p'. 

We observe that values of 0' G [0.6,0.9] and p' G [0.1, 0.5] provide good values for the match (below 
1%, which is better than the other NMU and NMF algorithms; see Figure]^. For the comprehensive 
comparison of the different algorithms in the next section, we will use (j)' = 0.7 and p' = 0.5 for LNMU, 
SNMU and PNMU. 



Figure 4: Average value of the match (3.1) of PNMU over 20 randomly generated synthetic HSI with 
^ = 0.2 and p = 0.05 for PNMU for different values of 0' and p'. 


Figurej^shows the materials extracted by NMU, NMF, SNMF (with sparsity 0.75), LNMU, SNMU 
and PNMU with (p' = 0.7 and p' = 0.5 for the noise level g = 0.3 and p = 0.15. We observe that 
PNMU identifies perfectly the four materials. SNMU performs relatively well but returns the basis 
elements with salt-and-pepper noise. LNMU returns spatially more coherent basis elements but they 
are mixture of several materials (hence not sparse enough). NMF and SNMF both return spatially 
less coherent basis elements, and SNMF returns more localized (sparser) ones identifying relatively 
well the materials (except for the smallest one). 

Remark 1. The code to generate these synthetic HSI, and compare the different algorithms is available 
from https:// sites. google, com/ site/nicolasgillis/ . Moreover, one can easily change the 
noise level, the number and the size of the materials, and the number of wavelengths. The values 
g = 0.3 and p — 0.15 were chosen because they correspond to a high noise level where PNMU still 
performs perfectly (see Figure\^. As we will see below, PNMU outperforms the other approaches for 
most reasonable values of the noise level. 

Regarding the convergence of PNMU, although it is difficult to analyze rigorously (cf. the discussion 
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Figure 5: Basis elements extracted by the different algorithms for the synthetic HSI with g — 0.3 
and p = 0.15 of Figure From top to bottom: True materials, NMF, SNMF with sparsity 0.75, 
NMU, LNMU, SNMU and PNMU with 0' = 0.8 and ji' = 0.5. The match values are: (NMF) 14.34%, 
(SNMF) 1.64%, (NMU) 13.17%, (LNMU) 2.12%, (SNMU) 7.26%, (PNMU) 0.003%. 


in the previous section). Figure [^displays the difference between iterates for this particular numerical 
experiment, showing the relative fast convergence in this case. 




Figure 6: Convergence for each rank-one factor generated by PNMU on the synthetic HSI with 
g — 0.3 and p = 0.15 from Figure Each plot displays the evolution of I 112 and 1— 

where is the tth iterate generated by PNMU for each rank-one factor uv^ computed 

sequentially (from left to right, top to bottom). 
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3.1.2 Comparison 

Let us vary the noise level in three different ways: 

o We fix 2? = 5% and we vary ^ = 0, 0.05,0.1,..., 1. Fignreshows the average value of the match 
in percent for the different algorithms. We observe that for all values of g smaller than 55%, 
PNMU outperforms the other approaches, being able to generate basis elements with match 
much smaller than 0.5% (in average, 0.12%). For higher noise levels, the performance of PNMU 
degrades rapidly. SNMF performs relatively well although the match is always higher than 0.7%. 
For high noise levels, it performs better than PNMU although for these values of the noise, all 
basis elements generated are relatively poor. 

The other approaches perform rather poorly, with most of the match values higher than 5%. 

o We fix ^ = 10% and we use p = 0,0.01,..., 1. Figure shows the average value of the match 
in percent for the different algorithms. PNMU performs best up to p = 35% (which is rather 
high as 35% of the entries of M are highly perturbed) with match smaller than 0.15% for all 
p < 0.1, and average match of 0.22% for 0.1 < p < 0.2. It is followed by SNMF while the other 
approaches perform relatively poorly. 


o We use g = 0.02 q and p = 0.01 q with q = 0,1, 2,..., 50. Figure shows the average value 
of the match in percent for the different algorithms. Up to g — 20, PNMU performs best with 
average match values smaller than 1%. For p > 23 which is rather high (p = 46% and p = 23%), 
PNMU deteriorates rapidly but, although SNMF has the lowest match values, they are relatively 
high and correspond to poor basis elements (and we see that LNMU has similar match values 
for these high noise levels). 


In all cases, we observe that PNMU outperforms the other NMF and NMU variants where the noise 
regime is reasonable. In particular, it performs very well (match < 1%) for values up to p = 0.1,p = 0.3 
(Figure [Tb]), p = 0.55,p = 0.05 (Figure [7^ and p = 0.4,p = 0.2 (Figure [7a|), which is not the case for 
the other algorithms. 


3.2 Real-world data sets 

In this section, we first illustrate the sensitivity of PNMU with respect to the parameters p' and p' 
on the Cuprite hyperspectral image. Not surprisingly, as we have already observed in the previous 
section, these parameters play a critical role. However, as opposed to classical NMU that is completely 
unsupervised, NMU with prior information may require human supervision for tuning its parameters 
and choosing a good trade-off between the reconstruction accuracy, and the spatial coherence and 
sparsity of the features. 

Then, a quantitative analysis is conducted. Because the ground truth is not know for these data 
sets, we use three measures: the relative error in percent 

\\M-UV^\\„ 

Relative Error = 100-^^-r-—^ 

the sparsity (percentage of nonzero entries) of the factors U 

s (U) = 100 #^^^°^ (^) g [ 0 ^ 100] for U G 
mr 
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(a) Evolution of the average match value (3.1) over 20 different synthetic data sets for different values 
of the Gaussian noise level {g) and for a fixed salt-and-pepper noise level (p = 5%). 



(b) Evolution of the average match value (3.1) over 20 different synthetic data sets for different values 
of the salt-and-pepper noise level (p) and for a fixed the Gaussian noise level {g = 10%). 



(c) Evolution of the average match value (3.1) over 20 different synthetic data sets for different values 
of the Gaussian noise level {g = 0.02^^) and the salt-and-pepper noise level (p = O.Olg^). 
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and the spatial coherence of the basis elements 


m = E 

k=l 


\\U{:,m2 ■ 


Recall that A:)||i amounts for the spatial coherence of the fcth column of the abundance matrix 

U^ while we normalize the columns of U to have a fair comparison (since the algorithms do not 
normalize the columns of U in the same way). The aim of our experiments is to compare the algorithms 
in terms of the trade-off between these three measures. 

Recall that \\NU{:^k)\\i amounts for the spatial coherence of the kth column of the abundance 
matrix U^ while we normalize the columns of U to have a fair comparison (since the algorithms do 
not normalize the columns of U in the same way). The aim of our experiments is to compare the 
algorithms in terms of the trade-off between these three measures. 

Adding constraint to the factorization process leads to an increase of the approximation error, but 
the constrained variants return sparser and/or spatially more coherent solutions. As already noted 
in [8], it is not fair to compare directly the approximation error of NMU with NMF because NMU 
computes a solution sequentially. In order to compare the quality of the generated sparsity patterns, 
one can postprocess the solutions obtained by all algorithms by optimizing over the non-zero entries 
of U and V (A-HALS can be easily adapted to handle this situation). We will refer to ^‘Improved^’ as 
the relative approximation after this post-processing step. 

Two sets of experimentation are conducted. The aim of the first experiments is to show the 
behavior of the proposed method, and to test its effectiveness in correctly detecting the endmembers 
in a widely used hyperspectral image, the Cuprite image, and reducing noise in the abundance maps. 
The second experiment shows the effectiveness of PNMU in correctly detecting the parts of facial 
images on one of the most widely used data set in the NMF literature, namely the MIT-CBCL face 
data set. 


3.2.1 Cuprite 

The Cuprite data sefQ is widely used to assess the performance of blind hyperspectral unmixing 
techniques. It represents spectral data collected over a mining area in southern Nevada, Cuprite. It 
consists of 188 images with 250 x 191 pixels containing about 20 different endmembers (minerals), 
although it is unclear how many minerals are present and where they are located precisely; see, 
e.g., na n for more information. First we discuss the influence of the parameters ji' and Lp' on 
the abundances obtained with PNMU. Then a quantitative comparison is performed to show the 
effectiveness of PNMU compared to the tested NMU and NMF variants. 


Sensitivity of PNMU to the Parameters fi' and (p' In this section, we conduct some experiments 
to show the sensitivity of PNMU to the parameters ji' and p'. To show a wide range of abundance 
elements for different values of p' and p'^ we only extract 7 abundance columns (r = 7, light tones in 
the figures indicate a high degree of membership). Figure shows the first seven abundance images 
extracted by PNMU, when the penalty for sparsity is fixed {p' = 0.2) and the penalty for spatial 
coherence varies {p' = 0.1, 0.3,..., 0.9). We observe that increasing p' improves the spatial coherence 
of the abundance images. However, high values of p' (starting for p' = 0.5) lead to blurred images 
and lost of contours (sub-figures and 7h). 

Adding sparsity constraints into NMU allows to detect endmembers more effectively because spar¬ 
sity prevents the mixture of different endmembers in one abundance element [IT] . Figureshows the 

^Available at http : // speclab. cr. usgs . gov/PAPERS. imspec . evol/aviris . evolution. html 
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(g) PNMU with parameters /i' = 0.7 and Lp' = 0.2 . 



Figure 7: Influence of the locality term on the first seven abundance images for the Cuprite data set. 
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first seven bases obtained with PNMU for /i' fixed to 0.1 and where ip' varies {ip' = 0.1, 0.3,..., 0.9). It 
can be observed that, as for the spatial information, small values of the sparsity parameter p)' (0.1 or 
0.3) provide good results; in fact, for higher values of this parameter, the abundances are too sparse, 
and PNMU is not able to detect good features; see sub-figures and 

Furthermore, it is necessary to point out that the sparsity and the locality constraints influence 
each other. Indeed, higher sparsity implies that less pixels are present in each abundance element 
which improves the spatial coherence (at the limit, all pixels take the value zero and the spatial 
coherence is perfect). For example, we observe on Figure]^ that increasing p' also strongly influences 
sparsity. A human supervision is necessary to adequately tune these parameters. For the Cuprite 
data set, we have observed that PNMU with /i' = 0.1 and ip' — 0.2 returns basis elements with a good 
tradeoff between sparsity and spatial coherence. 


7h 


Quantitative Comparison We now compare PNMU with the algorithms listed in the introduction 
of Section to show its ability to extract features with a good trade-off between reconstruction error, 
sparsity and spatial coherence. Figures and display the abundance images obtained by the 
different algorithms for r = 21 on the Cuprite data set. Table [T] provides a quantitative comparison 
of the different algorithms, with the values of the relative error, sparsity and spatial coherence. 



NMF 

SNMF 

NMU 

LNMU 

SNMU 

PNMU 

Error 

0.62 

2.59 

1.37 

1.44 

1.87 

1.85 

Improved 

0.61 

2.02 

0.71 

0.66 

1.13 

1.09 

s{U) 

3.76 

77.96 

50.41 

40.90 

76.33 

75.29 

HU) 

3606 

3829 

2585 

2039 

2292 

1381 


Table 1: Comparison of the relative approximation error, the sparsity and the spatial coherence for 
the Cuprite data set. 


We observe the following: 


o NMF is not able to detect endmembers, and generates mostly dense hence not localized abun¬ 
dance images, most of them containing salt-and-pepper-like noise; see sub-figure 

This qualitative observation is confirmed by the quantitative measurements from Table al¬ 
though NMF has the lowest reconstruction error (since it only focuses on minimizing it), it has 
the lowest value for the sparsity of the abundances, and the second highest value for their spatial 
coherence (higher values mean worse spatial coherence). 


o Despite the fact that SNMF gives sparser solutions than NMF, the abundance images are sur¬ 
prisingly very poor in terms of local coherence (highest value); see sub-figure |9b[ 

o NMU returns sparse abundances, and is able to localize different endmembers (see the description 
of the PNMU abundance maps below), even if some images are rather noisy and the edges 
defining the materials are not always well identified; see, e.g., 11th and 16th abundance elements 
in sub-figure 1^ It generates sparser and spatially more coherent features than NMF, while its 
relative error is comparable (0.61% vs. 0.71%). 


o Not surprisingly, LNMU provides spatially more coherent (from 2585 to 2039) but denser (from 
50.41 to 40.90) bases than NMU. In particular, the edge delimitation of several materials is 
better defined in the LNMU abundance maps, although some are still rather noisy (e.g., 13th, 


17th, 19th) and relatively dense (e.g., 6th, 8th); see sub-figure 10a 
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(a) PNMU with parameters /r' = 0.1 and (p' = 0.1. 



(b) PNMU with parameters /i' = 0.1 and Lp' = 0.3. 



(c) PNMU with parameters /i' = 0.1 and ip' 


(d) PNMU with parameters p' = 0.1 and ip‘ 


(e) PNMU with parameters /i' = 0.1 and (/?' = 0.9 . 


Figure 8: Influence of the sparsity term on the first seven abundance images for the Cuprite data set. 
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o Adding the penalty term inducing sparsity on NMU allows to remove some of the noise, though 
the edges of some materials are still not well defined (sub-figure 10b). In fact, compared to 
NMU, SNMU quite naturally increases the sparsity of NMU abundances (from 50.41 to 76.33) 
and improves the spatial coherence (from 2585 to 2292), although the relative error is slightly 
increased (from 0.71% to 1.13%). 

o PNMU achieves the best trade-off between relative error (1.1% vs. 0.6% for NMF hence an 
increase of only 0.5%, while NMF clearly generates poor basis elements because they are mostly 
dense and noisy-see the discussion above), sparsity (close to that of SNMF, 77.96% vs. 75.29%), 
and spatial coherence (PNMU achieves the lowest value). Sub-figure 10c shows the abundance 


images obtained with PNMU for the Cuprite data set with parameters /i' = 0.1 and — 0.2. It 
can be observed that the images are less noisy (as for SNMU), and the edges of materials are well 
defined (as for LNMU). Hence, PNMU combines the advantages of all methods on this data set. 
In fact, it is able to identify many endmembers (see where these materials are identified) 

among which, from left to right, top to bottom (snb-fignre |10c| ): (2) Desert Varnish, (3) Alunite 1, 
(5) Chalcedony and Hematite, (6) Alunite 2, (7) Kaolinite 1 and Goethite, (9) Amorphous iron 
oxydes and Goethite, (10) Chalcedony, (12) Montmorillonite, (13) Muscovite, (14) Kaolinite 1, 
(15) K-Alunite, (18) Kaolonite 2, and (20) Buddingtonite. 


3.2.2 MIT-CBCL Face database 

The previous experiments show the effectiveness of the proposed method in identifying materials that 
compose a hyperspectral image. We now apply PNMU on MIT-CBCL Face databas^with 2429 faces, 
19 X 19 pixels each. This is one of the most widely used data set in the NMF literature and was used 
in the original paper of Lee and Seung [18]. This is a database of faces and non-faces images, used 
at the Center for Biological and Computational Learning at MIT and we use the faces subset of the 
training set. 

We use this data set to show that PNMU can also be used for other types of images, and provides 
meaningful and useful results compared to NMF and NMU variants. As for the Cuprite data set, we 
will use /i' = 0.1 and cp' = 0.2. 

Figure [T^ displays the abundance images obtained with the different algorithms, while Table 
reports the numerical results. As for hyperspectal images, PNMU provides a good trade-off between 
reconstruction error, sparsity and spatial coherence. 



NMF 

SNMF 

NMU 

LNMU 

SNMU 

PNMU 

Error 

8.18 

8.67 

15.29 

15.90 

15.50 

14.63 

Improved 

8.16 

8.46 

8.76 

8.46 

9.80 

9.52 

s{U) 

54.89 

84.80 

70.05 

58.65 

87.69 

85.77 

HU) 

457 

310 

419 

55 

334 

271 


Table 2: Comparison of the relative approximation error, the sparsity and the spatial coherence for 
the MIT-CBCL data set. 


NMF, NMU and LNMU generate mixed abundances where different parts of the faces are rep¬ 
resented (for example in the sub-figure lie, the third abundance image in the fourth row, includes 
the nose, the mouth, the eyes and the eyebrows all together). The sparse methods (SNMU, PNMU, 
SNMF) are able to detect unmixed parts of the faces, but among these, the parts returned by PNMU 


^http://cbcl.mit.edu/software-datasets/FaceData2.html 
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(a) NMF. 



(b) SNMF. 



(c) NMU. 

Figure 9: Abundance elements obtained for the Cuprite data set with NMF, SNMF and NMU algo¬ 
rithms. 


20 









































(b) SNMU. 
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(c) PNMU. 

Figure 10: Abundance elements obtained for the Cuprite data set with LNMU, SNMU and PNMU. 
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are more localized. In fact, PNMU has a sparsity level comparable with the other methods enhancing 
sparsity, but has a better spatial coherence. 


4 Conclusions 

In this paper a variant of NMU was proposed, namely PNMU, taking into account both sparsity and 
spatial coherence of the abundance elements. Numerical experiments have shown the effectiveness 
of PNMU in correctly generating sparse and localized features in images (in particular, synthetic, 
hyperspectral and facial images), with a better trade-off between sparsity, spatial coherence and re¬ 
construction error. 
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